Using Natural Language Processing, LocusLink And The Gene Ontology To Compare OMIM To MEDLINE
نویسندگان
چکیده
Researchers in the biomedical and molecular biology fields are faced with a wide variety of information sources. These are presented in the form of images, free text, and structured data files that include medical records, gene and protein sequence data, and whole genome microarray data, all gathered from a variety of experimental organisms and clinical subjects. The need to organize and relate this information, particularly concerning genes, has motivated the development of resources, such as the Unified Medical Language System, Gene Ontology, LocusLink, and the Online Inheritance In Man (OMIM) database. We describe a natural language processing application to extract information on genes from unstructured text and discuss ways to integrate this information with some of the available online resources.
منابع مشابه
The GENIA Corpus: an Annotated Research Abstract Corpus in Molecular Biology Domain
With the information overload in genome-related field, there is an infreest need for natural language processing technology to extract information from literature and various attempts of information extraction using NLP has been being made. We are developing the necessary resources including domain ontology and annotated corpus from research abstracts in MEDLINE database (GENIA corpus). We are ...
متن کاملUsing Natural Language Processing and the Gene Ontology to Populate a Structured Pathway Database
Reading literature is one of the most time consuming tasks a busy scientist has to contend with. As the volume of literature continues to grow there is a need to sort through this information in a more efficient manner. Mapping the pathways of genes and proteins of interest is one goal that requires frequent reference to the literature. Pathway databases can help here and scientists currently h...
متن کاملPresenting a method for extracting structured domain-dependent information from Farsi Web pages
Extracting structured information about entities from web texts is an important task in web mining, natural language processing, and information extraction. Information extraction is useful in many applications including search engines, question-answering systems, recommender systems, machine translation, etc. An information extraction system aims to identify the entities from the text and extr...
متن کاملCombining terminologies and ontologies to integrate biomedical information
The post genomics era is characterized by huge amounts of biomedical information, distributed in multiple databanks (e.g. SWISS-PROT, OMIM, LocusLink, GenBank, as well as many others). Despite recent efforts to provide standard ontologies such as Gene Ontology, semantic heterogeneity is a major obstacle to information integration. Each databank has its own identifiers for genes and gene product...
متن کاملMining Terminological Knowledge in Large Biomedical Corpora
Terminological knowledge of the biomedical domain is important for natural language processing (NLP) and information retrieval (IR) applications, and a number of terminological knowledge sources, such as LocusLink, GeneBank, and the UMLS, already exist. However, because of the tremendous amount of research activity in the field, new terms and symbols are continually being created, many of which...
متن کامل